Multi-centroidal duration generation algorithm for HMM-based TTS

نویسندگان

  • Yongguo Kang
  • Jian Li
  • Yan Deng
  • Miaomiao Wang
چکیده

A novel method is proposed to improve the duration prediction for HMM based speech synthesis. Based on the decision tree trained by the conventional HTS training method, the duration instances of every leaf node are further clustered into several classes by the K-means clustering method, and the mapping functions between the context features and class labels are trained by CRF. Instead of using the mean value of the Gaussian distribution of a leaf node in the decision tree as the predicted duration, the weighted summation of the multicentroids from these several clustered classes is used to predict the phoneme duration. The weights are given by the output probability provided by CRF according to input context features and the prior probability from the clustering results. Compared with conventional HTS method, experiments show that the proposed method can significantly reduce RMSE in objective evaluations and achieves better preference scores in the subjective evaluations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved generation of prosodic features in HMM-based Mandarin speech synthesis

The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...

متن کامل

Based Speech Synthesis

This paper describes a approach to text-to-speech synthesis (TTS) based on HMM. In the proposing approach, speech spectral parameter sequences are generated from HMMs directly based on maximum likelihood criterion. By considering relationship between static and dynamic features during parameter generation, smooth spectral sequences are generated according to the statistics of static and dynamic...

متن کامل

Performance Analysis of Text To Speech Synthesis System Using HMM And Prosody Features With Parsing For Tamil Language

This paper describes a Hidden Markov Model (HMM) based (TTS) system and prosody based (TTS) system for producing natural sounding synthetic speech in Tamil language. The (HMM) based system consists of two phases such as training and synthesis. Tamil speech is first parameterized into spectral and excitation features using Glottal Inverse Filtering (GIF). An emotions present in the input text is...

متن کامل

Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis

In this paper, we describe an HMM-based speech synthesis system in which spectrum, pitch and state duration are modeled simultaneously in a unified framework of HMM. In the system, pitch and state duration are modeled by multi-space probability distribution HMMs and multi-dimensional Gaussian distributions, respectively. The distributions for spectral parameter, pitch parameter and the state du...

متن کامل

A hybrid TTS between unit selection and HMM-based TTS under limited data conditions

The intelligibility of HMM-based TTS can reach that of the original speech. However, HMM-based TTS is far from natural. On the contrary, unit selection TTS is the most-natural sounding TTS currently. However, its intelligibility and naturalness on segmental duration and timing are not stable. Additionally, unit selection needs to store a huge amount of data for concatenation. Recently, hybrid a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013